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Abstract 

Consider the n-dimensional vector y = Xj3 + e, where (3 £W has only k nonzero entries 
and e € M" is a Gaussian noise. This can be viewed as a hnear system with sparsity con- 
straints, corrupted by noise. We find a non-asymptotic upper bound on the probabihty that 
the optimal decoder for [3 declares a wrong sparsity pattern, given any generic perturbation 
matrix X. In the case when X is randomly drawn from a Gaussian ensemble, we obtain 
asymptotically sharp sufficient conditions for exact recovery, which agree with the known 
necessary conditions previously established. 

Keywords: Subset selection, compressive sensing, information theoretic bounds, random pro- 
jections. 



1 Introduction 

A wide array of problems in science and technology reduce to finding solutions to underdeter- 

mined systems of equations, particularly to systems of linear equations with fewer equations 
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than unknowns; examples include array signal processing [T], neural [2] and genomic data anal- 
ysis [3j, to name a few. In many of these applications, it is natural to seek for sparse solutions of 
such systems, i.e., solutions with few nonzero elements. A common setting is when we believe or 
we know a priori that only a small subset of the candidate sources, neurons, or genes influence 
the observations, but their location is unknown. 

More concretely, the problem we consider is that of estimating the support of /? G W, 
given the a priori knowledge that only k of its entries are nonzero, and based on the following 
observational model, 

y = Xp + e, (1) 

where X £ M"^^ is a collection of perturbation vectors, y G M" is the output measurement and 
e G is the additive measurement noise, assumed to be zero mean and with known covariance 
equal to Inxn] this entails no loss of generality, by standard rescaling of p. Each row of X and 
the corresponding entry of y are viewed as an input perturbation and output measurement, 
respectively. For that reason, n designates the size of measurements, p size of features and k 
size of relevant features. As mentioned earlier, the main problem is to optimally estimate the 
set of nonzero entries of /?, i.e. the sparsity pattern, based on the n-dimensional observation 
vector y and the (m x n) perturbation matrix X, and to study conditions on the key parameters 
that guarantee (asymptotically) that the sparsity pattern is recovered reliably. The geometric 
structure of the problem is represented by p and k, whereas the size of the measurements and 
signal-to-noise ratio are given by n and \\/3\\2, respectively. Therefore, (n,p, A:, H/JUl) may be 
viewed as the key parameters that asymptotically determine whether reliable sparsity pattern 
recovery is possible or not. The aforementioned question can be posed in terms of (n,p, k, 
where /?min = miuj upon noting that \\f3\\2 > k0l^^^. 

A large body of recent work, including [H El [6l [71 [8], analyzed reliable sparsity pattern 
recovery exploiting optimal and sub-optimal decoders for large random Gaussian perturbation 
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matrices. The average error probability, necessary and sufficient conditions for sparsity pattern 
recovery for Gaussian perturbation matrices were analyzed in As a generalization of the 
previous work, necessary conditions for general random and sparse perturbation matrices were 
presented in [3]. Various performance metrics regarding the sparsity pattern estimate were 
examined in [6]. We will discuss the relationship to this work below in more depth, after 
describing our analysis and results in more detail. 

The output of the optimal (sparsity) decoder is defined as the support set of the sparse 
solution (3 with support size k that minimizes the residual sum of squares, where, 

(5= argmin ||y-X6'||2, (2) 

|support(6)|=fc 

is the optimal estimate of (3 given the a priori information of sparseness. The support set of (3 
is optimal in the sense of minimizing the probability of identifying a wrong sparsity pattern. 

Below, first, we present an upper bound on the probability of declaring a wrong sparsity 
pattern based on the optimum decoder, as a function of the perturbation matrix X. Second, 
we exploit this upper bound to find asymptotic sufficient conditions on {n,p, k, for reliable 
sparsity recovery, in the case when the entries of the perturbation matrix are independent 
and identically distributed (i.i.d.) normal random variables. Finally, we show that our results 
strengthen earlier sufficient conditions (SJ [U EJ [7], and we establish the sharpness of these 
sufficient conditions in both the linear, i.e., k = @{p), and the sub-linear, i.e., k = o{p), regimes, 
for various scalings of 

Notation. The following conventions will remain in effect throughout this paper. Cal- 
ligraphic letters are used to indicate sparsity patterns defined as a set of integers between 1 
and p, with cardinality k. We say f3 £ M.P has sparsity pattern T if only entries with indices 
i £ T are nonzero. T — T stands for the set of entries that are in T but not in T and |T| for 
the cardinality of T. We generally denote by Xf G M"^l-^l, the matrix obtained from X by 
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extracting |T| columns with indices obeying i £ T. Let S{(3) stand for the sparsity pattern or 
support set of (3. All norms are £2, || • || = || • lb- 

1.1 Results 

For the observational model in equation ([1]), assume that the true sparsity model is T, so that, 

y = Xr(3r + e. (3) 

We first state a result on the probability of the event S{/3) = for any ^ T and any 
perturbation matrix X. 

Theorem 1. For the observational model of equation (0j and estimate (3 in equation the 
conditional probability Vt[S{(3) = J-\X, P,T] that the decoder declares T when T is the true 
sparsity pattern, is bounded above by ^ where c = ^^^|^, d = \T — T\ 

and = XriX^XrY^X^. 

The proof of Theorem [H given in Section 12.11 employs the Chernoff technique and the 
properties of the eigenvalues of the difference of projection matrices, to bound the probability of 
declaring a wrong sparsity pattern T instead of the true one T as function of the perturbation 
matrix X and the true parameter /3. The error rate decreases exponentially in the norm of the 
projection of Xt-j^Pt-j^ on the orthogonal subspace spanned by the columns of Xjr. This is 
in agreement with the intuition that, the closer different subspaces corresponding to different 
sets of columns of X are, the harder it is to differentiate them, and hence the higher the error 
probability will be. 

The theorem below gives a non-asymptotic bound on the probability of the event S{(3) 7^ T, 
when the entries of the perturbation matrix X are drawn i.i.d. from a normal distribution. 

Theorem 2. For the observational model of equation and the estimate (3 in equation 



4 



if the entries of X are i.i.d. M{0, 1), p > 2k, 

(--^)/3min>4^f^, (4) 



and 



then 



logk{p-k) A;log(^) + logA; 



n — k > C max 



e{p — /c) "' 



/or S = 

The proof of Theorem [21 given in Section 12.21 uses union bound together with counting 
arguments similar in spirit to those |5J , to bound the probabihty of error of the optimal decoder. 

If we let n{p), k{p) and /?min(p) scale as a function of p, then the upper bound of Pr[5(/9) ^ T] 
scales like k{p — k)~^ . For i3 > 2 or, equivalently, C > 9 the probability of error as p ^ oo is 
bounded above by p~^ for some D > 1. Therefore, the following sum, 

oo 

5^Pr[5(/3pxi)/T,], (5) 
p=i 

is finite, and as a consequence of Borel-Cantelli lemma, for large enough p, the decoder declares 
the true sparsity pattern almost surely. In other words, the estimate P based on ([2|) achieves 
the same loss as an oracle which is supplied with perfect information about which coefficients 
of P are nonzero. The following corollary summarizes the aforementioned statements. 

Corollary 3. For the observational model of equation (0j and the estimate f3 in equation 

let n, k and P"^^^ scale as a function of p, such that {n — k)P^^^ > 4 ^"^^/!^'"'"^ . Then there exists 

a constant C* such that, if 

n>G ™|iog(l + /3^J'log(l + fe/3^J''/' 
then a.s. for large enough p, f3 achieves the same performance loss as an oracle which is supplied 
with perfect information about which coefficients of P are nonzero and S{P) = T. 



The sufficient conditions in Corollary [3] can be compared against similar conditions for 
exact sparsity pattern recovery in [3 El [8] ; for example, in the sub-linear regime k = o{p), 
when = [SI [8] proved that n = G(A;log(|)) is sufficient, and [6l [7] proved that 

n = Q{klog{p — k)) is sufficient. In that vain, according to Corollary [31 

suffices to ensure exact sparsity pattern recovery and, therefore, it strengthens these earlier 
results. 



Scaling 


Sufficient condition 


Necessary condition 




Corollary 3 


Theorem 4 [4] 


k = e{p) 






PLn = 


n = Q{plogp) 


n = @{plogp) 


k = e{p) 








n = @{p) 


n = Q{p) 


k = e{p) 






Piin = 0(1) 


n = @{p) 


n = Q{p) 


k = o{p) 






PLn = 0(i) 


n = Q{plog{p — k)) 


n = <c){p\og{p — k)) 


k = o{p) 

/^min ^\ k ) 


\^ log log k J 


^Moglogfc'' 


k = o{p) 

PLn = 0(1) 


- = -x{e(^),e«} 


n = max{G(^),e«} 



Table 1: Tight necessary and sufficient conditions on the number of measurements n required 
for reliable support recovery in different regimes of interest. 
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What remains is to see whether the sufficient conditions in Corollary [3] match the necessary 
conditions proved in [3] : 

Theorem 4. Suppose that the entries of the perturbation matrix X G W^^p are drawn 
i.i.d. from any distribution with zero-mean and variance one. Then a necessary condition for 
asymptotically reliable recovery is that: 

n > max{/i(fc,p,/3^in),/2(A;,p,/3min)>^ - 1}> 

where 

log il) - 1 



/l(^,P,/3mm) 



ilog(l + A;/3^,Jl-|)) 



. a2 ^ log(p-fe + l)-l 



ilog(l + /?^i„(l-^)) 
The necessary condition in Theorem U] asymptotically resembles the sufficient condition in 
Corollary [3l recall that log (|!) < A;log(^). The sufficient conditions of Corollary [3] can be 
compared against the necessary conditions in [4J for exact sparsity pattern recovery, as shown 
in Table [TJ We obtain tight sufficient conditions which match the necessary conditions in the 
regime of linear and sub-linear signal sparsity, under various scalings of the minimum value 

2 Proof of Theorems 
2.1 Theorem 1 

For a given sparsity pattern J^, the minimum residual sum of squares is achieved by. 



min ||y - XyrOjrf = \\y - Ilyry 



|2 
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where Hj^ denotes the orthogonal projection operator into the column space of Xjf; among all 
sparsity patterns with size k, the optimum decoder declares, 

T'iViX) = argmin \\y - Xlj^yf, 
|^|=fc 

as the optimum estimate of the true sparsity pattern in terms of minimum error probability. 
Recall the definition of (5 in equation ([2]) and note that 5(/3) = T{y,X). It is clear that the 
decoder incorrectly declares T instead of the true sparsity pattern (namely T), if and only if, 

or equivalently, 

:= y'^(n^ - nr)y > 0. 

The rest of the proof reduces to finding an upper bound on the probability that Zjr > with 
the aid of the Chernoff technique: 

^ ' ^ ~ \t\<l/2 ^ I ' '^^J 



< ^-c\\{I-nr)Xr-rPr-r\ 



2 id 
^ 2 



The infimum is taken over |t| < 1/2 to guarantee boundedness of the expectation. The last 
inequality, proven in the next lemma, concludes the proof. 

Lemma 5. For y ~ N{Xrl3r, I) define Z = (lij: — Tl'r)y and let \T — T\ = d. then: 
inf logE[e^*|X,T,/3] < ^ - -Yl^)Xr-rPr-rf ■ 

|t|<l/2 Z I 



Proof. Note that for y ~ AA(^, /) Gaussian integrals yield: 
E[e*^^*2^] = (27r)-t ^ e 



T,T,,, I o/2,,T,T,rr_o/,T,^-l,T,„ „ || (/- 2tM/) (g-ep ) |p 



det(/-2t^')2 J (27r)"/2det(/ - 2*^-^2 



Y-de, 



where eo = 2t{I - 2t^')'^'I'^i. Thus, 



logE[e^*] = 2tV'^^(^-2t^')~^*Ai + t/i^*/i- ^ log det(I - 2i^). 

^1 



Substituting /i = Xq-Pr and ^I' = IIjf — Uj- we obtain, 



-||(/-n^)Xr/3rf 



(6) 



and similarly, we have. 



= 11(1 - n^)Xr-^/3r-^f • 



(7) 



Therefore, 



logE[e^*] = 2tV'^^'(/-2t^')~^^'/i + t/i^^';U- -logdet(I-2t^') 

< 2*2 11(1 - 2t^')-i/2||2^T^2^ ^ ^^T^^ _ 1 iogdet(I - 2*^-) 

= {2*2 11(1 - 2i^)"V2||2 _ ^1 II (J _ n^)Xr-^/3r-^||2 - i logdet(I - 2t^) 



< 

4 
< 



2t2||(I-2t^)- 



-1/2||2 



2t^ 



1 -2t 



II (/ - U^)Xr-^Pr-^f - - log(l - 4*2). 



(8) 



The first inequality follows by an application of the Cauchy-Schwarz inequality and the second 
equality follows from equations (j6l7p . Regarding the third and fourth inequality note that the 
top eigenvalue of ^' = Hj^ — Hq- is bounded by one and therefore / — 2t^ is positive definite 
for |t| < 1/2. The difference of projection matrices IIjf — Uj- has d = \T — T\ pairs of nonzero 
positive and negative eigenvalues, bounded above by one and bounded below by negative one, 
respectively, and equal in magnitude. Letting the d positive eigenvalues of Hjr — Yiq- be denoted 
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by Ai,-- - ,Xd, 



logdet(/-2t^) = ^{log(l - 2tAi) + log(l + 2tAi)} 



i=l 
d 



i=l 

> dlofffl -4*2). 



Furthermore, 



\\{I -2m)-^/^f = max (1 - 2tAi)-^ 

l>i>d 

< (l-2t)-\ 



which yields the fourth inequahty. Finally, since inequality ([8|) is true for any \t\ < 1/2 we take 
the infimum of ~ t over \t\ < 1/2 which is equal to \/2 - 3/2 at t = 1/2(1 - ^/2/2) and 
obtain the desired bound: 

inf logE[e^*] < -i:l^||(I-^^)Xr-^/3r-^||2-^log(^/2-l/2) 

\t\<l/2 I I 

2.2 Theorem 2 

First, to find conditions under which Pr[£'p] asymptotically goes to zero, with Ep defined as the 
event that 5(/3) is not equal to T, we exploit the union bound in conjunction with counting 
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arguments and lemma [6] proved below. We have: 



Pv[Ep] = Pr [U^^r{^^ > 0}] 
< Pr [Z^ > 0] 

k 

d=i \j^~r\=d 

k 

d=i \T-r\=d 

k 



3 ^^d[|+log{M£^M)]_Il^log(l+2cd/3^,J 

< ;temax{f +log(fc(p-fc))-Ilf^ log(l+2c/3^iJ,fc[|+log(H^)]-il^ log(l+2cfc/3^ij} 



The first inequality is proved in Lemma [6] below, and the second inequality follows from the 
observation that there are (^) (^^'^) sparsity patterns that differ in exactly d elements with T. 
For the third inequality recall the definition of /3min and that log (^) < 61og(^). Finally, the 
last inequality follows from the convexity of the function, 

m := 4^ + log(^^)] - ^ log(l + 2cd(3lJ, 

when, 

k0, 



{n-k)Pi,,>4 ' . (10) 



As a consequence of convexity the maximum of /(.) is attained at its boundary which is d = 1 
and d = k. To see that f{d) is convex, taking derivatives yields, 

5 , , .k{p-k) cPl^ini^-k) 



fid) = - + log( 



f"{d) = -- + 



2 d^ ' l + 2cd0^ 

2 , 2^l3i,^{n-k) 



2 



d (1 + 2cd/3, 



2 \2 ■ 

mill/ 
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and inequality ([TO]) yields f"{d) > 0. Therefore, for Pr[£'p] 0, it suffices that, 

Iog(p -k) k log(£^) + 1 



n — k > C max 



) 

mm/ 



fill 



^log(l + /34J'log(l + fc/3^ 

for a large enough constant C Now, given condition (jlip above, we obtain a non-asymptotic 
upper bound on the error probability by continuing from equation Q. To this end we have. 



^ + log(fc(p-A;))-^^^log(l + 2c/5^iJ < ^ + log(A;(p - A;)) - ^ log(p - A;) 

< 2 ^log(]5-A:), 



(12) 



since 2k < p, and similarly. 



5 , . fP-k. 



n — k 



log(l + 2c/c/?^ J < k 



5 , , rP-^\ 



2" 



, 1 /P- k\ 
k\og[ — — ) + k 



< 



C-5 



A;log(^^--^) + A; 



(13) 



In the end, if inequality pT]) is satisfied, inequalities ([T2]) and (fT3]) together with the bound 
obtained in inequality Q yield. 



Pr[Ep] < ke^/^ max i {p - k)'^' , 



e{p — k) 
k 



for C = 



Lemma 6. For Gaussian perturbation matrices, with X-ij ~ AA(0, 1) the average error probability 
that the optimum decoder declares T is bounded by, 



Fr[f{y,X)=T\(3,T]<e 



:iog{l+2c||/3T-^iP)+l 



with d=\T - andc= 

Proof. The columns of Xj^ and Xq-^jr are, by definition, disjoint and therefore indepen- 
dent Gaussian random matrices with column spaces spanning random independent \J^\- and 
I T — ^l -dimensional subspaces, respectively. The Gaussian random vector Xq-_jr[3q-_jr has i.i.d. 
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Gaussian entries with variance ||/3t-:^|P- Therefore, we conclude that, since the random Gaus- 
sian vector Xq-^jrPq-_-p is projected onto the subspace orthogonal to the random column space 
of X:f, the quantity ||(/ - n^)Xr-^/?r-^|| Vll/^r 

-jc-|p is a chi-square random variable with 

n — k degrees of freedom. Thus, 

Pr[t(y,X) =^|/?,T] = Ex{Pi[f{y,X)=J^\X,P,T]] 

^ g-2^1og{l+2c|l/3r-^|P)+|_ 

The first inequality follows from Theorem 1 and the second equality comes from the well-known 
formula for the moment-generating function of a chi-square random variable, E^^^ 2 = 
(1 - 2t)~^, for 2t < 1. 

3 Conclusion 

In this paper, we examined the probability that the optimal decoder declares an incorrect 
sparsity pattern. We obtained a sharp upper bound for any generic perturbation matrix, and 
this allowed us to calculate the error probability in the case of random perturbation matrices. In 
the special case when the entries of the perturbation matrix are i.i.d. normal random variables, 
we computed an accurate upper bound on the expected error probability. Sufficient conditions 
on exact sparsity pattern recovery were obtained, and they were shown to be stronger than 
those in previous results O El El [8] . Moreover, these results match the corresponding necessary 
condition presented in [3]. An interesting open problem is to extend the sufficient conditions 
derived in this work to non-Gaussian and sparse perturbation matrices. 
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